METHOD AND APPARATUS IN A TELECOMMUNICATIONS SYSTEM 



TECHNICAL FIELD OF THE INVENTION 

"Tn&^p^esent invention relates generally to methods for 
improving s^^cti^auality in e.g. IP- telephony systems. More 
particularly the present IwveQtion relates to a method for 
reducing audio artefacts due to overrun tw? — Hudscrun in a 
playout buffer. ""^ 



The invention also relates to an arrangement for carrying out 
the method. 



DESCRIPTION OF RELATED ART 



When sampling frequencies, in e.g. a speech coding system, are 
not controlled, underrun or overrun might occur in the playout 
buffer, which is a buffer storing speech samples for later 
playout. Underrun means that the playout buffer will run into 
starvation, i.e. it will no longer have any samples to play on 
the output. Overrun means that the playout buffer will be 
filled with samples and that following samples cannot be 
buffered and consequently will be lost. Underrun is probably 
more common than overrun since the size of the playout buffer 
can increase until there is no memory left, while it only can 
decrease until there are no samples left. 



Tihicrently , most systems do not deal with the problem that the 
sampling^fa?omiency might differ considerably between the 
sending and the refc^i^lng side. One possible solution proposed 
in, EP-0680033 A2 , works on^pLfech periods. Adding or removing 
pitch periods in the speech signet— ^achieves a different 
duration of a speech segment without af f ectiag^other speech 
characteristics than speed. This proposed solution 'might be 
used as an indirect sample rate conversion method. 




Another, solution uses the beginning of talkspurts as an 
indicatri^n^to reset the playout buffer to a specified level. 
The distance, lrT^iiiimber of samples, between two consecutive 
talkspurts is increaseaTb^the receiving side is playing faster 
than the sending side and de£i?e^sed if the receiving side is 
playing slower than the sendin^\side . In IP- telephony 
solutions, using the IP/UDP/RTP-^l5Qtocols (Internet 
Protocol /User Datagram Protocol /Real Time Protot^L) ; the marker 
flag in the RTP header is used to identify the beg^tfrn^g of a 
talkspurt. At the beginning of a talkspurt the playout btr^fer 
is set to a suitable size. 



J H^e solution according to EP-0680033 A2 , where pitch periods 
are remSvexi or inserted, assumes a fixed conversion factor 
between the r&eeiving and transmitting side. Therefore it 
cannot be used in dynamical systems, i.e. where the sampling 
frequencies varies. Further , iTfc^oes not solve the problem with 
underrun or overrun situations, but^-^s instead focused on 
changing the playback rate of a speecli*"si v gnal stored in 
compressed form for playback later and at an& 
compared to when it was stored. 

Using the method of resetting the playout buffer to a certain 
siT^ — Qauses problems if there are very long talkspurts, e.g. 
broadcast ^^^rsm^one speaker to several listeners. Since the 
length of a talkspur^f'^s^not defined in the beginning of the 
talkspurt the size to reset torrt±^fa£be either too small or too 
large. If it is too small, underrun wilio?^«^and if it is too 
large, unnecessary delay is introduced, thus ^^ETfe-^groblem 
persists. ^^"^ 




•TEe^geae^al problem with the currently known approaches is that 
they are static aTrd — inflexible. As a conclusion dynamic 
solutions are required. — 



# • 



SUMMARY OF THE INVENTION 



The present invention deals with the problem of improving 
speech quality in systems where the sampling rate at a 
transmitting terminal differs from the playout rate of a 
receiving buffer at a receiving terminal. This is often the 
case in e.g. IP- telephony . 

V^i^Jt^-s^iTpling frequencies are not controlled, underrun or 
overrun might^oc^ur in the playout buffer at the receiving 
side, which causes auda&l^--arte facts in the speech signal. To 
avoid said overrun or underrun therfe-^is^a need for dynamically 
keeping the playout buffer to an average size^ drv-e^con trolling 
the fullness of the playout buffer. ^ — 

- Ono -objectof the present invention is thus to provide a method 
for reducing audio art^fcc^s---in_ ^_speec h signal due to overrun 
or underrun in the playout buffer. _ _ 

-PLiiu th e ir — Q^j^pt of the invention is to dynamically control the 



fullness of the playout buffer as nor ro i TTTrrr rhinp o;r t r a r telav. ■ _ } 

"Tile above mentioned objects are achieved by means of dynamic 
sampl^<ate conversion of speech frames, i.e. converting speech 
frames comprising N samples to instead comprise either N+l or 
N-l samples. More^^ecif ically the invention works on an LPC- 
esidual of the speech rr^ne and by adding or removing a sample 
in the LPC-residual , a sample^a^te conversion will be achieved. 
The LPC-residual is the output ^fr?om an LPC-f ilter, which 
removes the short-term correlation from tltevspeech signal. The 
LPC-filter is a linear predictive coding filfe^g^where each 
sample is predicted as a linear combination of^^pa^vious 
samples. ^^^-^ 

•By — u&ingr the proposed sample rate conversion method, the 
playout buffer^ Tarf — e^g^__an IP- telephony terminal, can be 
continuously controlled with onl^^malrl^audijt^af tefacts. Since 
the method works on a sample-by- sample basis the playout" 



"ean^_be^ kept to a minimum and hence no extra delay is 

introduced. TK^ solution also has very low complexity, 

especially when the LPC-residualr-^ii^5^^is available, which is 
the case in e.g. a speech decoder. " — — 

The term "comprises/ comprising" when used in this specification 
is taken to specify the presence of stated features, integers, 
steps or components but does not preclude the presence or 
addition of one or more other features, integers, steps, 
components or groups thereof . 

*^fti-though the invention has been summarised above, the method and 
arrangement accord±ng^^o_J^^ independent claims 1 and 

23 define the scope of the inven£±on-= — ^^rious embodiments are 
further defined in the dependent claims 2-12 and 24-^ 



BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 shows a transmitter and a receiver to which the 
method of the invention can be applied. 

Figure 2 shows a speech signal in the time domain. 

Figure 3 shows an LPC-residual of a speech signal in the time 
domain . 

Figure 4 illustrates four modules of the sample rate 
conversion method according to the invention. 

Figure 5A shows an analysis-by-synthesis speech encoder with 
LTP-f ilter . 

Figure 5B shows an analysis -by- synthesis speech encoder with 
adaptive codebook . 

Figures 5C-5F show different implementations of the LPC- 
residual extraction depending on the realisation of 
the speech encoder. 

Figures 5G-5J show four ways of placing the sample rate 
conversion within the feed back loop of the speech 
decoder. 

Figure 6 illustrates how to use information about pitch 
pulses to find samples with low energy. 




Figure 7 illustrates LPC-history extension. 

Figure 8 illustrates copying of the history of the LPC 
residual . 

DETAILED DESCRIPTION 

^he present invention describes, referring to figure 1, a 
metfesid for improving speech quality in a communication system 
comprisaM^g a first terminal unit TRX1 transmitting speech 
signals haviq^g a first sample frequency F x and a second 
terminal unit receiving said speech signals, buffering 

them in a playout biSf er 100 with said first frequency F x and 
laying out from said playout buffer with a second frequency 
F 2 . When the buffering frequ^cy F i is larger than the playout 
frequency F 2 the playout buffer iSIA will eventually be filled 
with samples and subsequent samples wlb^l have to be discarded. 
When the buffering frequency F x is low^r than the playout 
frequency F 2 the playout buffer will run into starvation, i.e. 
it will no longer have any samples to play on the ob^put . These 
two problems are called overrun and underrun respect ivfe-3^ , and 
causes audible artefacts like popping and clicking sound^sin 
the speech signal. ^ 

The above^roblems with underrun and overrun are solved by 
using dynamic sample" rate^conversion based on modifying the 
LPC-residual of the speech signal <mcT"w±ib^^ described 
with reference to figures 2-8. " ~ — — 

Figure 2 shows a typical segment of a speech signal in the time 
domain. This speech signal shows a short-term correlation, 
which corresponds to the vocal tract and a long-term 
correlation, which corresponds to the vocal cords. The short- 
term correlation can be predicted by using an LPC-filter and 
the long-term correlation can be predicted by using an LTP- 
filter. LPC means linear predictive coding and LTP means long 




term prediction. Linear in this case implies that the 
prediction is a linear combination of previous samples of the 
speech signal. 



The LPC- filter is usually denoted: 



1=1 



feeding a speech frame through the LPC- filter, H(z), the 
LPC-resi^al is found. The LPC-residual, shown in figure 3, 
contains pitcfi>^ulses P generated by the vocal cords. The 
distance L between twb--pitch pulses P is called lag. The pitch 
pulses P are also predictably, and since they represent the 
long-term correlation of the speecll^signal they are predicted 
through an LTP- filter given by the distanc^-L^between the pitch 
pulses P and the gain b of a pitch pulse P . Thels'EM ilter is 
usually denoted: 



F(z)=b Z - L 



x^LiPC- residual is fed through the inverse of the LTP- 
f ilter F(z) an LI^^ sidu^li s created. In the LTP-residual the 
long-term correlation in the LPC^reS±dual__is removed, giving 
the LTP-residual a noise-like appearance. 



^The^solution according to the invention modifies the LPC- 
residualTT^siiown in figure 3, on a sample-by- sample basis. That 
is, an LPC-resi^Krai^block comprising N samples is converted to 
an LPC-residual block complaining either N+l or N-l samples. The 
LPC-residual contains less inf orro&bix^n and less energy compared 
to the speech signal but the pitch pulses^P^re still easy to 
locate. When modifying the LPC-residual, samples belmg^close to 
a pitch pulse P should be avoided, because these samplbes. 



;ain more information and thus have a large influence on the 
speech syfr-fefa^sis. The LTP-residual is not as suitable as the 
LPC-residual to iT&e^for modification since the pitch pulse 
positions P are no longer ^a^raij_able . As a conclusion, the LPC- 
residual is better suited for modific^feipn both compared to the 
speech signal and the LTP-residual, since tH^^pi^tch pulses P 
are easily located in the LPC-residual. 



— The— p.rg>po s ed sample rate conversion consists of four modules, 



shown in figure 4 : 



1) A Sample Rate Controller (SRC) module 400 that calculates 
whether a sample should be added or removed; 



dual Extraction (LRE) modules 410 are used to 




obtain the LPC-residual r. 




_e Rate Conversion Methods (RCM) modules 42 0 find the 
position Ivhe^e^to add or remove samples and how to perform 
the insertion and^aeietion , i.e. converting the LPC 



residual block r LPC comprising N samp!3 



residual block r[p C comprising N+l or N-l samples; 



io a modified LPC- 




4) A Speech Synthesiser Module (SSM) 43 0 to reproduce the 
speech. 



the invention is that it is possible to change 
the playout rate of the — pla^out buffer 440 by removing or 
adding samples in the LPC-residual 




JRC module 400 decides whether samples should be added or 



removed in the LP 1 



This is done on the basis of 




at least one of the following ^ax*ame^rs^_tlie sampling 
frequencies of the sending TRX1 and receiving terminal 
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12, information about the speech signal e.g. a voice activity 
signal, status of the playout buffer or an indicator 
of the becjianing of a talkspurt. These inputs are named SRC 
Inputs in the firgj^re. On the basis of a function of one or 
several of these paraitteters the SRC 400 forms a decision on 
when to insert or remove a 
optionally which RCM 42 0 to use. S^tnQe digital processing of 
speech signals usually is made on a franfe^y-frame basis, the 
decision on when to remove or add samples b^s^cally is to 



le in the LPC residual and 



decide within which LPC-residual frame the RCM 

insert or remove a sample. 



shall 



There^a^e^basically three methods of obtaining the LPC-residual 
that is neeQed^as input to the RCM's 420. The methods 
depend on the implementation^f^the speech encoder and will be 
described with reference to figures 5A^*5FS — !£he LRE solution also 
directly influences the SSM solution, which wi 1 lB&eome^appar ent 
below. 



n figure 5A is an analysis-by-synthesis speech encoder 500 
withvLTP- filter 540 shown. This is a hybrid encoder where the 
vocal ttoct is described with an LPC- filter 550 and the vocal 
cords is a^scribed with an LTP-filter 540, while the LTP- 
residual r LTP (n)*5*s waveform-compared with a set of more or less 
stochastic codebookxyectors from the fixed codebook 530. The 
input signal S is divided into frames 510 with a typical length 
of 10-30 ms. For each fraftie an LPC-filter 550 is calculated 
through an LPC-analysis 520 andsthe LPC-filter 550 is included 
in a closed loop to find the parante^ers of the LTP-filter 540. 
The speech decoder 580 is included inN^ie encoder and consists 
of the fixed codebook 53 0 which output ^J>(h) is connected to 
the LTP-filter 540 which output f^in) is connec y ts^d to the LPC- 
filter 550 generating an estimate s(n) of the original speech 
signal s(n) . Each estimated signal s(n) is compared witl^the 
original speech signal s(n) and a difference signal e(n) i v s N 




ulated. The difference signal e(n) is then weighted 560 to 
calculate a per^epirta^J-^wei^ht^ error measure e w (n) . The set of 
parameters that gives the least: pe^cg]Dtual weighted error 
measure e w (n) is transmitted to the receiving sideSTCT: — 

-fts^can be seen in figure 5C the LPC-residual ^^(n) is the 
output froitr--Uie LTP- filter 540. The SRC/RCM modules 545 can 
thus be connected ditectly to that output and integrated into 
the speech encoder. The LRE coTrsi^ts of the fixed codebook 53 0 
and the long-term predictor 540 and thel?SM^onsists of an LPC- 
filter 550, thus the LRE-module and the SSM-moduTe^a^riatural 
parts of the speech decoder. ^** mm * mm ^ 



^f the speech encoder, on the other hand, is an analysis-by- 
synth^sis speech encoder where the LTP-filter 540 is exchanged 
to an adaptive codebook 590 as shown in figure 5B, the LPC- 
residual ^rj> c (/i^is the output from the sum of the adaptive and 
the fixed codebook^go iand 530. All other elements have the 
same function as in fixture 5A showing the analysis-by- synthesis 
speech encoder with LTP-f il s t^r^500 . As can be seen in figure 5D 
the LPC residual ^p C ( n ) ^ s the^^sum of the output from the 
adaptive and fixed codebook 590 and\530 . The SRC/RCM modules 
545 can thus again be connected directly to that output and 
integrated into the speech encoder as showri^in figure 5D. The 
LRE consists of the adaptive and the fixed codebobik 590 and 530 
and the SSM consists of an LPC-filter 550, thus the LR^ module 
and the SSM module are again natural parts of the si 
decoder . 




feenthe speech encoder has some sort of backward adaptation, 
it is not tea^ible to make alterations in the LPC-residual 
since this would af fec^t^feh^adaptation process in a detrimental 
way. In figure 5E is shown howT!iT^fehgse cases the parameters 
s(n) from the LPC-filter 550 could be fed to Srr--jjaverse LPC- 
filter 525 placed after the speech decoder. After the sampibe- 
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lversion has been made in the SRC/RCM modules 545 an 
LPC-f iltering 530 — i-s— .performed to reproduce the speech signal. 
The LRE module consists of the inve^ag_ LPC-f ilter 525 and the 
SSM module consists of the LPC-f ilter 550 



lia^figure 5F is shown how it is possible to produce an LPC 
residu^S^f^^) through a full LPC analysis. The output s(n) 
from the spee^A decoder is fed to both an LPC analysis block 
520 and an LPC-inVorse filter 525. After the sample rate 
conversion has been mad^<i_n the SRC/RCM modules 545, an LPC 
filtering 550 is performed to reproduce the speech signal. The 
LRE consists in this case of the Lt>G^analysis 520 respective 
the LPC inverse filter 525 and the SSM moth^Le consists of the 
LPC filter 550. Performing an LPC analysis is c<3n^idered to be 
well known to a person skilled in the art and is ther^^Qre not 
discussed any further. 



zerring again to figure 4, assume that the SRC-module 400 has 
decided — fehat a sample should be added or removed in the LPC 

the LRE module 410 has produced an LPC 




e 42 0 then has to find the exact 



to add or remove a 



sample and performing the adding respective removing . There are 
four different methods for the RCM-module 42 0 to^-find the 
insertion or deletion point. 




Lirst and most primitive method arbitrarily removes or adds 
a sample wherTever^th i s becomes necessary. If the sample rate 
difference between thetTeiTnd^iaJLsis small this will only lead 
to minor artefacts since the adding"o£ — removing is performed 
very s e 1 dom . " — — 



By inserting or removing samples at positions where the energy 
in the LPC-residual is low the synthesis will be less affected. 
This is due to the fact that segments close to pitch pulses 




will then be avoided. To find these segments of low energy 
either a sliding window method or a simpler block energy- 
analysis can be used. 



The second method, called the sliding window energy method, 
calculates a weighted energy value for each sample in the LPC- 
residual. This is done by multiplying k samples surrounding a 
sample with a window function of size k (k«N) , where N equals 
the number of samples in the LPC-residual. Each sample is then 
squared and the sum of the resulting k values is calculated. 
The window is shifted one position and the procedure is 
repeated. The position where to insert or remove samples is 
given by the sample with the lowest weighted energy value. 



The third method, block energy analysis, is a simpler solution 
for finding the insertion or deletion point. The LPC-residual 
is simply divided into blocks of equal length and an arbitrary- 
sample is removed or added in the block with the lowest energy. 



VJlae fourth method, as illustrated in figure 6, uses knowledge 
aboutT^the position P of a pitch pulse, and the lag L between 
two pitch ptriL^es . With knowledge about that, it is possible to 
calculate a positTfcoQ^ P' having low energy and where it is 

tb\§dd or remove a sample . The new 
P+kL where the constant k 
^be somewhere in the 
middle between two pitch pulses, thus avoidi?tg<oositions with 
high energy. A typical value of k is in the range^of.0 . 5 to 
0.8. 



therefore appropriate 
position P' can be expressed as" 
is selected so that P' is selected 



the RCM-module 42 0 has calculated the position where to 
dd or remov^a^sample it must be determined how to perform the 
insertion or deletion. The^e^are three methods of performing 
such insertion or deletion depending^Dn — t±*e_ type of LRE-module 
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Jthe first method either zeros are added or samples with 
small amplri-fcu^es are removed. This method can be used for all 
LRE solution described-^bove, see figures 5C-5F. Notice that in 
figures 5C and 5D the SRC/SCM-m^dules are placed before the 
synthesis filter SSM, but after tfe^tQed back of the LPC 
residual to the LTP- filter 540 respective the adaptive codebook 
590. 



J:he second method insertion is carried out by adding zeros 
and interj3oia£ijig surrounding samples. Deletion is performed by 
removing samples and— p^ef^ erably smoothing surrounding samples . 
This method can also be usedxEor all of the LRE solutions 
described above, see figures 5C-5F ^"Jtotj-ce that in figure 5C 
and 5D the SRC/RCM-modules are placed befcJre^the synthesis 
filter SSM, but after the feed back of the LPC resi^tiaJL to the 
LTP- filter 540 respective the adaptive codebook 590. 



M the third method the SRC/RCM-modules 545 are placed within 
the fefed^jack loop of the speech decoder, see figures 5G-5J, 
instead of after the feedback loop as in the previous methods . 
Placing the SRC/RCI^mQdules within the feedback loop uses real 
LPC residual samples fSr^the sample rate conversion, by 
changing the number of compon^^§^in the LPC-residual . The 
implementation differs depending on whether it is an analysis- 
by-synthesis speech encoder with LTP filter sh^vm. in figure 5A 
or an analysis-by-synthesis speech encoder with\adaptive 
codebook shown in figure 5B, that is used. 



"Pc«^the speech decoder with LTP filter, see figure 5A, the 
SRC/RCM-^motftil^s 545 can be placed within the feedback loop in 
two different waysT*— -either within the LTP feedback loop as 
shown in figure 5G or in th^"trt*Leut from the fixed codebook 53 0 
as shown in figure 5H. For the spee$h--^<igcoder with adaptive 
codebook, see figure 5B, the SRC/RCM can alscTB&^i^ced in two 
different ways, i.e. either before, figure 5J, or after ,tl^Lce^ 
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— SJ- r- th ^~--B^mroa4LlQ n of the outputs from the adaptive and the 
fixed codebook. ~ — 



je alterations on the LPC residual consists of removing or 
addirig^samples just as before but since the SRC/RCM-modules 545 
are place^^ithin the LTP feedback loop some modifications must 
be done. The extending or shortening of a segment can be done 
in three ways eitRer at the respective ends of the segment or 
somewhere in the midd^e^^of the segment . Figure 7 shows the case 
here the LPC residual is\extended by copying two overlapping 
segments, segment 1 and segment 2, from the history of the LPC 
residual to create the longer I>RC residual. The normal case 
when no insertion or deletion is nfe^ded would be to copy N 
samples. Shortening the LPC residual is achieved by copying two 
segments that has a gap between them Ninstead of being 
overlapped. As before, it is important that a\pitch pulse is 
not doubled or removed since this would introduce perceptual 
artefacts. Hence, an analysis should be performed in\order to 
evaluate where to add or remove segments. This analysis is 
preferably made by using the same methods as discussed above 
regarding how to find the position where to add or remove 
sample in the RCM-module. 



For all implementations except when the SRC/RCM-modules 545 are 
placed between the fixed codebook 530 and the LTP filter 540 
the history of the LPC residual also has to be modified. The 
lag L will be increased or decreased for the specific part of 
the history where a sample is inserted or deleted. Thus the 
starting position of the segment that will be copied from the 
history of the LPC residual, Pointer 1 or Pointer 2 in figure 
8, needs modification. If the segment to copy is newer, i.e. 
the case of Pointer 1, there is no need to modify the starting 
position. If, however, the segment to copy is older, i.e. the 
case of Pointer 2 , then the pointer should be increased or 
decreased depending on if a sample is inserted or deleted. This 



has to be managed for subsequent sub- frames and frames as long 
as the modification is within the history of the LPC residual. 

When the SRC/RCM-modules are placed before the summation of the 
outputs from the adaptive and the fixed codebook as shown in 
figure 5J the length of the fixed codebook also needs to be 
changed. This is done by adding a sample, preferably a zero 
sample, in the output from the fixed codebook or removing one 
of the components. The insertion and deletion in the fixed 
codebook should be synchronised with the insertion and deletion 
in the adaptive codebook. 

Theinvention being thus described, it will be obvious that the 
same may^>e^varied in many ways. Such variations are not to be 
regarded as a ciepSl^feu^e^^f rom the scope of the invention, and 
all such modifications as woWteUJ^e obvious to a person skilled 
in the art are intended to be include3""WiJ^iin the scope of the 
following claims. " " — 



